-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[draft] Introduce the ADValue class for templated traceless forward mode #64
base: master
Are you sure you want to change the base?
[draft] Introduce the ADValue class for templated traceless forward mode #64
Conversation
This is marked as draft, because there are several things that need to be discussed, details that may need to be changed, and features that may need to be added. This includes the following topics:
|
3cd17ca
to
263ff71
Compare
Notice that the test failure is unrelated, since this pull request does not change anything about the existing code and the failure also shows up in current master. |
In general one would expect that it is beneficial to exploit symmetry for second order derivatives. To my surprise this is slower for static dimension. My first guess on why computing only one triangle of the Hessian is slower was, that this is due to the conditional index flip when requesting values. It turns out that this is not the case. Even if one only queries the computed triangle and removes the index flip this is still slower. To be precise, using for(int i=0; i<dim; ++i)
for(int j=0; j<=dim; ++j)
... did always produce faster code than for(int i=0; i<dim; ++i)
for(int j=i; j<=dim; ++j)
... Surprisingly (in the static dim case) the following is still faster than the latter but not fully on par with the former for(int i=0; i<dim; ++i)
for(int j=0; j<=dim; ++j)
if (j>=i)
... I guess that the code for computing the full matrix benefits from auto-vectorization using SIMD operations. For dynamic dimension the situation is different: Here exploiting symmetry was measurably faster. |
Another performance observation: One can always compute higher order derivatives using nested // Raw input vector
using Vector = std::array<double,dim>;
// Nested 1st order univariate AD-value to compute mixed second order derivatives
using Nested = ADValue<ADValue<double,1,1>, 1, 1>;
// AD-aware vector
using ADVector = std::array<NestedAD, dim>;
Vector x = ...
ADVector x_ad;
for(int i=0; i<dim; ++i)
for(int j=i; j<=dim; ++j)
{
// Initialize values of AD-aware vector
for(int k=0; k<=dim; ++k)
x_ad[k] = x[k];
// Track (i,j)-th derivative
x_ad[i].partial(0).partial() = 1;
x_ad[j].partial().partial(0) = 1;
auto y = f_raw(x_ad);
hessian[i][j] = y.partial(0).partial(0);
hessian[j][i] = hessian[i][j];
} Surprisingly this was in my tests on par with the dynamic size 2nd order |
Hmmm interesting. Did your tests include higher dimensions and more complex functions? I'm not sure if I understand what your actual test function looks like. I would expect this to not hold in general. Otherwise, it would be something to keep in mind for the optimization of ADOL-C. I would further expect that this behavior changes for higher derivative (>2) tensors as well. |
Since we currently dont have a higher-order tape-less approach, its very cool that your approach can compute higher-order derivatives. At some point this could be replaced by an interface without having to nest all types, as with the tape-based method of ADOL-C. |
I only tested 2nd order (more is not supported by I basically used two test cases:
auto f_raw = [](auto x) {
using std::sin;
using std::cos;
using std::exp;
using std::log;
using std::pow;
using std::fabs;
using F = std::decay_t<decltype(x[0])>;
F c1;
c1 = 13;
F c2(42);
F c3;
c3 = 1;
Dune::FieldMatrix<F, 3, 3> A;
for(auto i : Dune::range(3))
for(auto j : Dune::range(3))
A[i][j] = sin(x[0]*i + cos(x[1])) + exp(cos(j*x[0]*x[1]))/(1+x[0]*x[0]+x[1]*x[1]) + 2/(1+x[0]*x[0]+x[1]*x[1]);
F z = 1;
z *= pow(A.determinant(), 3) + pow(2, sin(x[0]*x[1])) + c1 + c2 + pow(sin(x[0])+2., cos(x[1])+2);
z *= c3;
if ((x[0]>0) and (x[0] <= x[1]))
return F(z);
else
return F(0);
}; |
|
de0872c
to
418782f
Compare
ffece1a
to
bc860ee
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #64 +/- ##
=======================================
Coverage 65.57% 65.57%
=======================================
Files 51 52 +1
Lines 26162 26165 +3
Branches 1843 1843
=======================================
+ Hits 17156 17159 +3
Misses 9006 9006
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
I just (force-) pushed an updated version. This improves many aspects:
|
… mode This was initially developed within the Dune module dune-fufem (https://gitlab.dune-project.org/fufem/dune-fufem). The class `adolc::ADValue<T,maxOrder,dim>` implementes a traceless forward mode templated with respect to the underlying type `T`, the maximal tracked derivative order `maxOrder` and the domain dimension `dim`. Currently only `maxOrder<=2` is implemented. Using the special value `dynamicDim` allows to use a runtime dimension. In the latter case the default is to use `std::vector` as storage. If support for `boost::pool` is activated, this is used instead. Notable differences to `adtl::adouble` are: * Being able to use other types that `T=double` e.g. allows for low/high precision or interval arithmetics. * Support for second order derivatives. The interface is also flexible for later implementation of higher order. * Fixing the dimension at compile time allows the compiler to do better optimization and generate faster code. Furthermore this guarantees thread-safety (x) by construction and allows to use different dimensions at the same time within a single program. * In the dynamic dimension case, the dimension is set per variable, which allows to have different dimensions at the same time. This is implemented by using one `boost:pool` per dimension instead of a single one for a globally fixed dimension. This also avoids the memory leaks of `adtl::adouble` when changing the global dimension. Since the pools are `thread_local` instead of global, this again provides thread-safety (X). The last point could probably also be implemented for `adtl::adouble`. (x) "Thread-safety" hear does _not_ mean that concurrent access to a single `ADValue` object is safe. Instead it means that concurrent accesss to different objects in different threads is safe and not prone to race-conditions due to internally used global variables. While higher order derivatives could also be computed using `adtl_hov::adouble`, this requires several calls to compute mixed derivatives which is significantly slower compared to computing all at once. Furthermore the latter cannot be used in concurrent threads and with different dimensions at once.
The test is still missing checks for many features.
bc860ee
to
bfa0545
Compare
Since #93 was merged, I rebased, such that that the test for |
This was initially developed within the Dune module dune-fufem (https://gitlab.dune-project.org/fufem/dune-fufem).
The class
adolc::ADValue<T, maxOrder,dim>
implementes a traceless forward mode templated with respect to the underlying typeT
, the maximal tracked derivative ordermaxOrder
and the domain dimensiondim
. Currently onlymaxOrder<=2
is implemented.Using the special value
dynamicDim
allows to use a runtime dimension. In the latter case the default is to usestd::vector
as storage. If support forboost::pool
is activated, this is used instead.Notable differences to
adtl::adouble
are:T=double
e.g. allows for low/high precision or interval arithmetics.boost:pool
s per dimension instead of a single one for a globally fixed dimension. This also avoids the memory leaks ofadtl::adouble
when changing the global dimension. Since the pools arethread_local
instead of global, this again provides thread-safety (X).The last point could probably also be implemented for
adtl::adouble
.(x) "Thread-safety" hear does not mean that concurrent access to a single
ADValue
object is safe. Instead if means that concurrent accesss to different objects in different threads is safe and not prone to race-conditions due to internally used global variables.